Fftnet: a Real-time Speaker-dependent Neural Vocoder
نویسندگان
چکیده
We introduce FFTNet, a deep learning approach synthesizing audio waveforms. Our approach builds on the recent WaveNet project, which showed that it was possible to synthesize a natural sounding audio waveform directly from a deep convolutional neural network. FFTNet offers two improvements over WaveNet. First it is substantially faster, allowing for real-time synthesis of audio waveforms. Second, when used as a vocoder, the resulting speech sounds more natural, as measured via a “mean opinion score” test.
منابع مشابه
Real-time control of a DNN-based articulatory synthesizer for silent speech conversion: a pilot study
This article presents a pilot study on the real-time control of an articulatory synthesizer based on deep neural network (DNN), in the context of silent speech interface. The underlying hypothesis is that a silent speaker could benefit from real-time audio feedback to regulate his/her own production. In this study, we use 3D electromagnetic-articulography (EMA) to capture speech articulation, a...
متن کاملReal-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces
Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be contro...
متن کاملD 3 . 5 Final evaluation report
This deliverable provides a summary of several evaluations of our synthesis system and describes experiments in the synthesis of expressive speech conducted in WP3. A new vocoding method was developed, based on predicting time-domain glottal excitation waveforms directly from acoustic features with a deep neural network (DNN). The method was first evaluated with normal speaking style and then u...
متن کاملDeep Voice 2: Multi-Speaker Neural Text-to-Speech
We introduce a technique for augmenting neural text-to-speech (TTS) with lowdimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-ofthe-art approaches for single-speaker neural TTS: Deep Voice 1 and Tacotron. We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but con...
متن کاملA very low bit rate speech coder using HMM with speaker adaptation
This paper describes a speaker adaptation technique for a phonetic vocoder based on HMM. In the vocoder, the encoder performs phoneme recognition and transmits phoneme indexes and state durations to the decoder, and the decoder synthesizes speech using HMM-based speech synthesis technique. One of the main problems of this vocoder is that the voice characteristics of synthetic speech depend on H...
متن کامل